AITopics | hardware accelerator

Collaborating Authors

hardware accelerator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality

Neural Information Processing SystemsJun-19-2026, 04:02:11 GMT

Photorealistic Codec Avatars (PCA), which generate high-fidelity human face renderings, are increasingly being used in Virtual Reality (VR) environments to enable immersive communication and interaction through deep learning-based generative models. However, these models impose significant computational demands, making real-time inference challenging on resource-constrained VR devices such as head-mounted displays (HMDs), where latency and power efficiency are critical. To address this challenge, we propose an efficient post-training quantization (PTQ) method tailored for Codec Avatar models, enabling low-precision execution without compromising output quality. In addition, we design a custom hardware accelerator that can be integrated into the system-on-chip (SoC) of VR devices to further enhance processing efficiency. Building on these components, we introduce ESCA, a full-stack optimization framework that accelerates PCA inference on edge VR platforms. Experimental results demonstrate that ESCA boosts FovVideoVDP quality scores by up to +0.39 over the best 4-bit baseline, delivers up to 3.36 latency reduction, and sustains a rendering rate of 100 frames per second in endto-end tests, satisfying real-time VR requirements. These results demonstrate the feasibility of deploying high-fidelity codec avatars on resource-constrained devices, opening the door to more immersive and portable VR experiences. Paper website can be found at https://zmzfpc.github.io/ESCA/.

artificial intelligence, machine learning, quantization, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry: Semiconductors & Electronics (0.48)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

CATransformers: Carbon Aware Transformers Through Joint Model-Hardware Optimization

Neural Information Processing SystemsJun-11-2026, 05:31:50 GMT

Machine learning solutions are rapidly adopted to enable a variety of key use cases, from conversational AI assistants to scientific discovery. As the adoption of machine learning models becomes increasingly prevalent, the associated lifecycle carbon footprint is expected to increase, including both from training and inference and from AI hardware manufacturing. We introduce CATransformers, the first carbon-aware co-optimization framework for Transformer-based models and hardware accelerators. By integrating both operational and embodied carbon into early-stage design space exploration, CATransformers enables sustainability-driven model architecture and hardware accelerator co-design that reveals fundamentally different trade-offs than latency-or energy-centric approaches. Evaluated across a range of Transformer models, CATransformers consistently demonstrates the potential to reduce total carbon emissions --by up to 30\% -- while maintaining accuracy and latency. We further highlight its extensibility through a focused case study on multi-modal models. Our results emphasize the need for holistic optimization methods that prioritize carbon efficiency without compromising model capability and execution time performance.

machine learning, optimization problem, proceedings, (5 more...)

Neural Information Processing Systems

Industry: Energy (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.60)

Add feedback

Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers

Lu, Jinming, Tian, Jiayi, Zhao, Yequan, Li, Hai, Zhang, Zheng

arXiv.org Artificial IntelligenceDec-11-2025

Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.09202

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FPGA or GPU? Analyzing comparative research for application-specific guidance

Purkayastha, Arnab A, Tharwani, Jay, Aggarwal, Shobhit

arXiv.org Artificial IntelligenceNov-11-2025

The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have emerged as prominent solutions, each excelling in specific domains. Although there is substantial research comparing FPGAs and GPUs, most of the work focuses primarily on performance metrics, offering limited insight into the specific types of applications that each accelerator benefits the most. This paper aims to bridge this gap by synthesizing insights from various research articles to guide users in selecting the appropriate accelerator for domain-specific applications. By categorizing the reviewed studies and analyzing key performance metrics, this work highlights the strengths, limitations, and ideal use cases for FPGAs and GPUs. The findings offer actionable recommendations, helping researchers and practitioners navigate trade-offs in performance, energy efficiency, and programmability.

application, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.06565

Country: North America > United States (0.47)

Genre: Research Report (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality

Zhu, Mingzhi, Shang, Ding, Zhang, Sai Qian

arXiv.org Artificial IntelligenceOct-30-2025

Photorealistic Codec Avatars (PCA), which generate high-fidelity human face renderings, are increasingly being used in Virtual Reality (VR) environments to enable immersive communication and interaction through deep learning-based generative models. However, these models impose significant computational demands, making real-time inference challenging on resource-constrained VR devices such as head-mounted displays, where latency and power efficiency are critical. To address this challenge, we propose an efficient post-training quantization (PTQ) method tailored for Codec Avatar models, enabling low-precision execution without compromising output quality. In addition, we design a custom hardware accelerator that can be integrated into the system-on-chip of VR devices to further enhance processing efficiency. Building on these components, we introduce ESCA, a full-stack optimization framework that accelerates PCA inference on edge VR platforms. Experimental results demonstrate that ESCA boosts FovVideoVDP quality scores by up to $+0.39$ over the best 4-bit baseline, delivers up to $3.36\times$ latency reduction, and sustains a rendering rate of 100 frames per second in end-to-end tests, satisfying real-time VR requirements. These results demonstrate the feasibility of deploying high-fidelity codec avatars on resource-constrained devices, opening the door to more immersive and portable VR experiences.

artificial intelligence, machine learning, quantization, (18 more...)

arXiv.org Artificial Intelligence

2510.24787

Country: Asia (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Semiconductors & Electronics (0.48)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

The AI_INFN Platform: Artificial Intelligence Development in the Cloud

Anderlini, Lucio, Bianchini, Giulio, Ciangottini, Diego, Pra, Stefano Dal, Michelotto, Diego, Petrini, Rosa, Spiga, Daniele

arXiv.org Artificial IntelligenceOct-30-2025

Machine Learning (ML) is profoundly reshaping the way researchers create, implement, and operate data-intensive software. Its adoption, however, introduces notable challenges for computing infrastructures, particularly when it comes to coordinating access to hardware accelerators across development, testing, and production environments. The INFN initiative AI_INFN (Artificial Intelligence at INFN) seeks to promote the use of ML methods across various INFN research scenarios by offering comprehensive technical support, including access to AI-focused computational resources. Leveraging the INFN Cloud ecosystem and cloud-native technologies, the project emphasizes efficient sharing of accelerator hardware while maintaining the breadth of the Institute's research activities. This contribution describes the deployment and commissioning of a Kubernetes-based platform designed to simplify GPU-powered data analysis workflows and enable their scalable execution on heterogeneous distributed resources. By integrating offload-ing mechanisms through Virtual Kubelet and the InterLink API, the platform allows workflows to span multiple resource providers, from Worldwide LHC Computing Grid sites to high-performance computing centers like CINECA Leonardo. We will present preliminary benchmarks, functional tests, and case studies, demonstrating both performance and integration outcomes.

cloud computing, machine learning, platform, (13 more...)

arXiv.org Artificial Intelligence

2509.22117

Country: Europe > Italy > Sardinia (0.14)

Genre: Research Report (0.40)

Industry: Information Technology (1.00)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ABMax: A JAX-based Agent-based Modeling Framework

Chaturvedi, Siddharth, El-Gazzar, Ahmed, van Gerven, Marcel

arXiv.org Artificial IntelligenceOct-17-2025

Agent-based modeling (ABM) is a principal approach for studying complex systems. By decomposing a system into simpler, interacting agents, agent-based modeling (ABM) allows researchers to observe the emergence of complex phenomena. High-performance array computing libraries like JAX can help scale such computational models to a large number of agents by using automatic vectorization and just-in-time (JIT) compilation. One of the caveats of using JAX to achieve such scaling is that the shapes of arrays used in the computational model should remain immutable throughout the simulation. In the context of agent-based modeling (ABM), this can pose constraints on certain agent manipulation operations that require flexible data structures. A subset of which is represented by the ability to update a dynamically selected number of agents by applying distinct changes to them during a simulation. To this effect, we introduce ABMax, an ABM framework based on JAX that implements multiple just-in-time (JIT) compilable algorithms to provide this functionality. On the canonical predation model benchmark, ABMax achieves runtime performance comparable to state-of-the-art implementations. Further, we show that this functionality can also be vectorized, making it possible to run many similar agent-based models in parallel. We also present two examples in the form of a traffic-flow model and a financial market model to show the use case of ABMax

agent, artificial intelligence, simulation, (15 more...)

arXiv.org Artificial Intelligence

2508.16508

Country:

Europe (0.47)
North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Banking & Finance > Trading (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.91)

Add feedback

Selective Quantization Tuning for ONNX Models

Louloudakis, Nikolaos, Rajan, Ajitha

arXiv.org Artificial IntelligenceJul-17-2025

Quantization is a process that reduces the precision of deep neural network models to lower model size and computational demands, often at the cost of accuracy. However, fully quantized models may exhibit sub-optimal performance below acceptable levels and face deployment challenges on low-end hardware accelerators due to practical constraints. To address these issues, quantization can be selectively applied to only a subset of layers, but selecting which layers to exclude is non-trivial. To this direction, we propose TuneQn, a suite enabling selective quantization, deployment and execution of ONNX models across various CPU and GPU devices, combined with profiling and multi-objective optimization. TuneQn generates selectively quantized ONNX models, deploys them on different hardware, measures performance on metrics like accuracy and size, performs Pareto Front minimization to identify the best model candidate and visualizes the results. To demonstrate the effectiveness of TuneQn, we evaluated TuneQn on four ONNX models with two quantization settings across CPU and GPU devices. As a result, we demonstrated that our utility effectively performs selective quantization and tuning, selecting ONNX model candidates with up to a $54.14$% reduction in accuracy loss compared to the fully quantized model, and up to a $72.9$% model size reduction compared to the original model.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2507.12196

Country:

North America > United States (0.47)
South America > Brazil > Rio de Janeiro (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Bit-Flip Fault Attack: Crushing Graph Neural Networks via Gradual Bit Search

Abharian, Sanaz Kazemi, Dinakarrao, Sai Manoj Pudukotai

arXiv.org Artificial IntelligenceJul-9-2025

Graph Neural Networks (GNNs) have emerged as a powerful machine learning method for graph-structured data. A plethora of hardware accelerators has been introduced to meet the performance demands of GNNs in real-world applications. However, security challenges of hardware-based attacks have been generally overlooked. In this paper, we investigate the vulnerability of GNN models to hardware-based fault attack, wherein an attacker attempts to misclassify output by modifying trained weight parameters through fault injection in a memory device. Thus, we propose Gradual Bit-Flip Fault Attack (GBFA), a layer-aware bit-flip fault attack, selecting a vulnerable bit in each selected weight gradually to compromise the GNN's performance by flipping a minimal number of bits. To achieve this, GBFA operates in two steps. First, a Markov model is created to predict the execution sequence of layers based on features extracted from memory access patterns, enabling the launch of the attack within a specific layer. Subsequently, GBFA identifies vulnerable bits within the selected weights using gradient ranking through an in-layer search. We evaluate the effectiveness of the proposed GBFA attack on various GNN models for node classification tasks using the Cora and PubMed datasets. Our findings show that GBFA significantly degrades prediction accuracy, and the variation in its impact across different layers highlights the importance of adopting a layer-aware attack strategy in GNNs. For example, GBFA degrades GraphSAGE's prediction accuracy by 17% on the Cora dataset with only a single bit flip in the last layer.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.05531

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

Ahmadifarsani, Samira, Mueller-Gritschneder, Daniel, Schlichtmann, Ulf

arXiv.org Artificial IntelligenceJul-8-2025

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML) compilers remains a complex challenge due to the need for significant modifications in compilation layers and specialized scheduling techniques. Existing frameworks offer partial solutions and require users to navigate intricate compiler internals. In this paper, we introduce a TVM-based compilation integration approach that targets GEMM-based deep learning accelerators. Our approach abstracts the complexities of compiler integration, enabling seamless integration of accelerators without requiring in-depth knowledge of the underlying compiler. Furthermore, we extend and incorporate design space exploration tools, specifically CoSA, to automate efficient tensor scheduling, accounting for factors such as uneven mapping and double buffering. Our framework is benchmarked on the Gemmini accelerator, demonstrating performance comparable to its specialized manually implemented toolchain.

accelerator, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.04828

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Add feedback